Skip to content

Comments

feat: audio filter plugin#350

Merged
typester merged 38 commits intomainfrom
typester/audio-filter-plugin
Mar 1, 2025
Merged

feat: audio filter plugin#350
typester merged 38 commits intomainfrom
typester/audio-filter-plugin

Conversation

@typester
Copy link
Contributor

The rust-sdk implementation (livekit/rust-sdks#559) must be merged before this pull request.

@bcherry bcherry self-requested a review January 28, 2025 07:19
Copy link
Contributor

@bcherry bcherry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, lmk when it's ready for final review

@typester typester marked this pull request as ready for review February 6, 2025 21:17
@typester typester changed the title [WIP] feat: audio filter plugin feat: audio filter plugin Feb 6, 2025
Comment on lines 58 to 59
enable_filter: Any = None,
filter_options: dict[str, Any] | None = None,
Copy link
Member

@theomonnom theomonnom Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be more eager to add strict typings here.
I think we should remove the filter_options and put it inside the KrispFilter constructor.
I was imagining something like:

rtc.AudioStream(krisp=KrispFilter(options)) # options can be automatically determined by our third package that will find the dynamic lib inside our wheel

I'm not sure why we make it a generic interface today, feels easier to explicitly mention Krisp, wdyt?

EDIT: another option is to mention Enhanced Noise Suppression or Background Voice Cancellation instead of Krisp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I'm not familiar with Python, any suggestion like this is super welcome! Let me think about this

Comment on lines 20 to 23
{
"url": url,
"token": token,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make it strictly typed inside our protobufs definition instead of using json? The auth function could accept those as arguments (no need for a prost dependency)

def __init__(
self,
loop: Optional[asyncio.AbstractEventLoop] = None,
filters: Optional[List[Any]] = None,
Copy link
Member

@theomonnom theomonnom Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the constructor is tricky because users don't construct it on agents, we do it for them. IMO we should only expose this API inside the rtc.AudioStream for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so. But the Krisp filter needs to be initialized with the LIVEKIT_URL and TOKEN parameters to verify the token. So, I've initialized it here.

Instead, do you mean that it would be better to pass all of this information in the AudioStream constructor, as you suggested earlier? Or creating a separate init function to pass these info? Something like:

krisp.init(url, token)

or something like that. I think the latter is better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no strong opinion, tho I know dz wanted to avoid the auth on the python side (so grabbing the url/token on the rust side)

@typester typester requested a review from theomonnom February 12, 2025 20:04
capacity: int = 0,
sample_rate: int = 48000,
num_channels: int = 1,
audio_filter: Optional[Tuple[str, dict[str, any]]] = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way to get something that looks more like #350 (comment)
This type isn't self-explanatory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended to cover this in the filter module side, because the option parameter will vary depending on the filter module. It looks something like this:

audio_filter=noise_filter.NC()

Please refer to the noise_filter repo for more details

Copy link
Member

@theomonnom theomonnom Feb 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we want to make this generic, if we have more filters in the future we could also add new keyword arguments, I propose to change the argument name to be directly krisp

I also doubt we're going to have more filters? (Noise cancellationEcho cancellation should be exposed directly to Python with raw API). Because we need to use it on external speakers not available on the Rust side (not tied to any rtc.AudioStream ― I wouldn't design for it inside this PR)

I think we should narrow down the API for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exposing the raw API is better in terms of architecture, and I agreed with this in our previous discussion. But it ended up with it's not good with our metric system, wasn't it? So we decided to extend AudioStream for now.

In that case, I would say that it would be better to handle the filter options in noise_filter side and to keep AudioStream generic. This would make updates to noise_filter easier. What do you think?

Copy link
Member

@theomonnom theomonnom Feb 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant the echo cancellation in the previous message.

What I mean is why we want to keep the name generic like audio_filter instead of krisp
e.g krisp=KrispFilter(options)
or could also be noise_cancellation=krisp.AudioFilter(options)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to add this requirement to add the filters inside the RoomOptions? Can registering be skipped and happen implicitly?

Yes, I think we can do that. I hadn't considered that option, but I will if that's preferable.

Also, regarding the naming of livekit-plugins-krisp, if that's the case, I need to follow the common implementation rules for livekit-plugin, correct? I would really appreciate it if you could give me some pointers on that.

Copy link
Member

@theomonnom theomonnom Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it isn't that much work to do, I think it is worth it. This way, people only need to focus on a single spot in their code to enable Krisp.

The plugin would need to be contributed inside the agents repository, we have a livekit-plugins-minimal example here.

Copy link
Contributor Author

@typester typester Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it isn't that much work to do, I think it is worth it. This way, people only need to focus on a single spot in their code to enable Krisp.

Agreed. Will try to make it.

For the plugin repository, is it possible to implement a plugin in a separate repository? The krisp filter plugin needs to be bundled with non-opensource rust-ffi. so I don't think it's a good idea to put it in the plugin repo, as it would likely require implementing some tricky CIs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think the plugin should exist in the krisp filters repo, since it's not the same license & closed sourced. we just have to publish a package that's livekit-plugins-krisp, and that should be enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I did the following changes:

  • remove audio_filter parameters from RoomOptions, now filters are automatically registered.
  • rename audio_filter parameter of rtc.NativeStream parameter to krisp.

So now we can do:

from livekit.plugins import krisp

# ...

stream = rtc.AudioStream.from_participant(
    participant=participant,
    track_source=rtc.TrackSource.SOURCE_MICROPHONE,
    krisp=krisp.BVC(),
 )

Or something like that.

capacity: int = 0,
sample_rate: int = 48000,
num_channels: int = 1,
noise_cancellation: Optional[Tuple[str, dict[str, Any]]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the resolution in the other comment thread was but I'd suggest at least adding a type alias for this e.g. NoiseCancellationOptions so it's clear

also please add documentation to the docstring below

Copy link
Member

@theomonnom theomonnom Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think doing dataclass would be great

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added NoiseCancellationOptions as a dataclass and exported it for using it by noise-cancellation module.

Also a brief documentation was added!

capacity: int = 0,
sample_rate: int = 48000,
num_channels: int = 1,
noise_cancellation: Optional[Tuple[str, dict[str, Any]]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, please document the new parameter

capacity: int = 0,
sample_rate: int = 48000,
num_channels: int = 1,
noise_cancellation: Optional[Tuple[str, dict[str, Any]]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, please document the new parameter

Copy link
Member

@theomonnom theomonnom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise lgtm!! 🚀

@dataclass
class NoiseCancellationOptions:
module_id: str
options: dict[str, Any]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should the options include? This is the final unclear point.

Copy link
Contributor Author

@typester typester Feb 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The options are eventually encoded to JSON and will be passed to the FFI as Cstr.

The noise-cancellation module should have functions like:

def BVC() -> rtc.NoiseCancellationOptions:

And users this this type via the function like:

from livekit.plugins import noise_cancellation
# ...

rtc.AudioStream(
  # ...
  noise_cancellation=noise_cancellation.BVC(),
)

So users should not create this Options by their hand.

self._remote_participants[participant.identity] = participant
return participant

def _filter_handle(self, plugin) -> int | None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unused? Or maybe it is on the krisp-repo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is unused now. It was used for associate room and filters, but it was moved to rust-sdk. I'll remove this.

@typester typester merged commit 132e562 into main Mar 1, 2025
1 check passed
@typester typester deleted the typester/audio-filter-plugin branch March 1, 2025 00:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants